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METHOD AND APPARATUS FOR PERFORMING REDUCED 
RATE VARIABLE RATE VOCODING 

BACKGROUND OF THE INVENTION 

5 

I. Field of the Invention 

The present invention relates to communications. More particularly, 
the present invention relates to a novel and improved method and 
10 apparatus for performing variable rate code excited linear predictive (CELP) 
coding. 

II. Description of the Related Art 

15 Transmission of voice by digital techniques has become widespread, 

particularly in long distance and digital radio telephone applications. This, 
in turn, has created interest in determining the least amoimt of information 
which can be sent over the channel which maintains the perceived quality 
of the reconstructed speech. If speech is transmitted by simply sampling and 

20 digitizing, a data rate on the order of 64 kilobits per second (kbps) is required 
to achieve a speech quality of conventional analog telephone. However, 
through the use of speech analysis, followed by the appropriate coding, 
transmission, and resynthesis at the receiver, a significant reduction in the 
data rate can be achieved. 

25 Devices which employ techniques to compress voiced speech by 

extracting parameters that relate to a model of human speech generation are 
typically called vocoders. Such devices are composed of an encoder, which 
analyzes the incoming speech to extract the relevant parameters, and a 
decoder, which resynthesizes the speech using the parameters which it 

30 receives over the transmission channel In order to be accurate, the model 
must be constantly changing. Thus the speech is divided into blocks of 
time, or analysis frames, during which the parameters are calculated. The 
parameters are then updated for each new frame. 

Of the various classes of speech coders the Code Excited Linear 

35 Predictive Coding (CELP), Stochastic Coding or Vector Excited Speech 
Coding are of one class. An example of a coding algorithm of this particular 
class is described in the paper "A 4.8kbps Code Excited Linear Predictive 
Coder" by Thomas E. Tremain et al.. Proceedings of the Mobile Satellite 

Conference. 1988. 



I 
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The function of the vocoder is to compress the digitized speech signal 
into a low bit rate signal by removing all of the natural redundancies 
inherent in speech. Speech typically has short term redundancies due 
primarily to the filtering operation of the vocal tract, and long term 
5 redundancies due to the excitation of the vocal tract by the vocal cords. In a 
CELP coder, these operations are modeled by two filters, a short term 
formant filter and a long term pitch filter. Once these redundancies are 
removed, the resulting residual signal can be modeled as white Gaussian 
noise, which also must be encoded. The basis of this technique is to compute 

10 the parameters of a filter, called the LPC filter, which performs short-term 
prediction of the speech waveform using a model of the human vocal tract. 
In addition, long-term effects, related to the pitch of the speech, are modeled 
by computing the parameters of a pitch filter, which essentially models the 
human vocal chords. Finally, these filters must be excited, and this is done 

15 by determining which one of a number of random excitation waveforms in 
a codebook results in the closest approximation to the original speech when 
the waveform excites the two filters mentioned above. Thus the 
transmitted parameters relate to three items (1) the LPC filter, (2) the pitch 
filter and (3) the codebook excitation. 

20 Although the use of vocoding techniques further the objective in 

attempting to reduce the amount of information sent over the channel 
while maintaining quality reconstructed speech, other techniques need be 
employed to achieve further reduction. One technique previously used to 
reduce the amount of information sent is voice activity gating. In this 

25 technique no information is transmitted during pauses in speech. 
Although this technique achieves the desired result of data reduction, it 
suffers from several deficiencies. 

In many cases, the quality of speech is reduced due to clipping of the 
initial parts of word. Another problem with gating the channel off during 

30 inactivity is that the system users perceive the lack of the background noise 
which normally accompanies speech and rate the quality of the channel as 
lower than a normal telephone call. A further problem with activity gating 
is that occasional sudden noises in the background may trigger the 
transmitter when no speech occurs, resulting in annoying bursts of noise at 

35 the receiver. 

In an attempt to improve the quality of the synthesized speech in 
voice activity gating systems, synthesized comfort noise is added during the 
decoding process. Although some improvement in quality is achieved 
from adding comfort noise, it does not substantially improve the overall 
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quality since the comfort noise does not model the actual background noise 
at the encoder. 

A preferred technique to accomplish data compression, so as to result 
in a reduction of information that needs to be sent, is to perform variable 

5 rate vocoding. Since speech inherently contains periods of silence, i.e. 
pauses, the amount of data required to represent these periods can be 
reduced. Variable rate vocoding most effectively exploits this fact by 
reducing the data rate for ttiese periods of silence. A reduction in the data 
rate, as opposed to a complete halt in data transmission, for periods of 

10 silence overcomes the problems associated with voice activity gating while 
facilitating a reduction in transmitted information. 

Copending U.S. Patent Application Serial No. 08/004,484, filed 
January 14, 1993, entitled "Variable Rate Vocoder" and assigned to the 
assignee of the present invention and is incorporated by reference herein 

15 detaib a vocoding algorithm of tfie previously mentioned class of speech 
coders. Code Excited Linear Predictive Coding (CELP), Stochastic Coding or 
Vector Excited Speech Coding. The CELP technique by itself does provide a 
significant reduction in the amount of data necessary to represent speech in 
a manner that upon resynthesis results in high quality speech. As 

20 mentioned previously the vocoder parameters are updated for each frame. 
The vocoder detailed in the copending patent application provides a 
variable output data rate by changing the frequency and precision of the 
model parameters. 

The vocoding algorithm of the above mentioned patent application 

25 differs most markedly from the prior CELP techniques by producing a 
variable output data rate based on speech activity. The structure is defined 
so that ttie parameters are updated less often, or with less precision, during 
pauses in speech. This technique allows for an even greater decrease in the 
amount of information to be transmitted. The phenomenon which is 

30 exploited to reduce the data rate is the voice activity factor, which is the 
average percentage of time a given speaker is actually talking during a 
conversation. For typical two-way telephone conversations, the average 
data rate is reduced by a factor of 2 or more. During pauses in speech, only 
backgroimd noise is being coded by the vocoder. At these times, some of the 

35 parameters relating to the human vocal tract model need not be 
transmitted. 

As mentioned previously a prior approach to limiting the amount of 
information transmitted during silence is called voice activity gating, a 
technique in which no information is transmitted during moments of 
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silence. On the receiving side the period may be filled in with synthesized 
"comfort noise". In contrast, a variable rate vocoder is continuously 
transmitting data which, in the exemplary embodiment of the copending 
application, is at rates which range between approximately 8 kbps and 
5 1 kbps. A vocoder which provides a continuous transmission of data 
eliminates the need for synthesized "comfort noise", with the coding of the 
background noise providing a more natural quality to the synthesized 
speech. The invention of the aforementioned patent application therefore 
provides a significant improvement in S3mthesized speech quality over that 
10 of voice activity gating by allowing a smooth transition between speech and 
background. 

The vocoding algorithm of the above mentioned patent application 
enables short pauses in speech to be detected, a decrease in the effective 
voice activity factor is realized. Rate decisions can be made on a frame by 

15 frame basis wiA no hangover, so the data rate may be lowered for pauses in 
speech as short as the frame duration, typically 20 msec. Therefore pauses 
such as those between syllables may be captured. This technique decreases 
the voice activity factor beyond what has traditionally been considered, as 
not only long duration pauses between phrases, but also shorter pauses can 

20 be encoded at lower rates. 

Since rate decisions are made on a frame basis, there is no clipping of 
the initial part of the word, such as in a voice activity gating system. 
Clipping of this nature occurs in voice activity gating system due to a delay 
between detection of the speech and a restart in transmission of data. Use of 

25 a rate decision based upon each frame results in speech where all transitions 
have a natural sound. 

With the vocoder always transmitting, the speaker's ambient 
background noise will continually be heard on the receiving end thereby 
yielding a more natural sound during speech pauses. The present 

30 invention thus provides a smooth transition to background noise. What 
the listener hears in the background during speech will not suddenly 
change to a synthesized comfort noise during pauses as in a voice activity 
gating system. 

Since background noise is continually vocoded for transmission, 
35 interesting events in the background can be sent with full clarity. In certain 
cases the interesting background noise may even be coded at the highest 
rate. Maximum rate coding may occur, for example, when there is someone 
talking loudly in the background, or if an ambulance drives by a user 
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standing on a street comer. Constant or slowly varying background noise 
will, however, be encoded at low rates. 

The use of variable rate vocoding has the promise of increasing the 
capacity of a Code Division Multiple Access (CDMA) based digital cellular 
5 telephone system by more than a factor of two. CDMA and variable rate 
vocoding are uniquely matched, since, with CDMA, the interference 
between channels drops automatically as the rate of data transmission over 
any channel decreases. In contrast, consider systems in which transmission 
slots are assigned, such as TDMA or FDMA. In order for such a system to 

10 take advantage of any drop in the rate of data transmission, external 
intervention is required to coordinate the reassignment of unused slots to 
other users. The inherent delay in such a scheme implies that the channel 
may be reassigned only during long speech pauses. Therefore, full 
advantage cannot be taken of the voice activity factor. However, with 

15 extemal coordination, variable rate vocoding is useful in systems other than 
CDMA because of the other mentioned reasons. 

In a CDMA system speech quality can be slightly degraded at times 
when extra system capacity is desired. Abstractly speaking, the vocoder can 
be thought of as multiple vocoders all operating at different rates with 

20 different resultant speech qualities. Therefore the speech qualities can be 
mixed in order to further reduce the average rate of data transmission. 
Initial experiments show that by mixing full and half rate vocoded speech, 
e.g. the maximum allowable data rate is varied on a frame by frame basis 
between 8 kbps and 4 kbps, the resulting speech has a quality which is better 

25 than half rate variable, 4 kbps maximum, but not as good as full rate 
variable, 8 kbps maximimi. 

It is well known that in most telephone conversations, only one 
person talks at a time. As an additional fimction for full-duplex telephone 
links a rate interlock may be provided. If one direction of the link is 

30 transmitting at the highest transmission rate, then the other direction of the 
link is forced to trai\smit at the lowest rate. An interlock between the two 
directions of the link can guarantee no greater than 50% average utilization 
of each direction of the link. However, when the channel is gated off, such 
as the case for a rate interlock in activity gating, there is no way for a listener 

35 to interrupt the talker to take over the talker role in the conversation. The 
vocoding method of the above mentioned patent application readily 
provides the capability of an adaptive rate interlock by control signals which 
set the vocoding rate. 



wo 96/04646 



PCTAJS9S/09780 



6 

In the above mentioned patent application the vocoder operated at 
either full rate when speech is present or eighth rate v^hen speech is not 
present. The operation of the vocoding algorithm at half and quarter rates 
is reserved for special conditions of impacted capacity or when other data is 
5 to be transnutted in parallel with speech data. 

Copending U.S. Patent Application Serial No. 08/118,473, filed 
September 8, 1993, entitled "Method and Apparatus for Determining the 
Transmission Data Rate in a Multi-User Communication System** and 
assigned to the assignee of the present invention and is incorporated by 

10 reference herein details a method by which a communication system in 
accordance with system capacity measurements limits the average data rate 
of frames encoded by a variable rate vocoder. The system reduces the 
average data rate by forcing predetermined frames in a string of full rate 
frames to be coded at a lower rate, i.e. half rate. The problem with reducing 

15 the encoding rate for active speech frames in this fashion is that the limiting 
does not correspond to any characteristics of the input speech and so is not 
optimized for speech compression quality. 

Also, in copending U.S. Patent Application Serial No. 07/984,602, 
filed December 2, 1992, entitled "Improved Method for Determining Speech 

20 Encoding Rate in a Variable Rate Vocoder", now U.S. Patent NO. 5.341,456, 
issued August 23, 1994, and assigned to the assignee of the present 
invention and is incorporated by reference herein, a method for 
distinguishing unvoiced speech from voiced speech is disclosed. The 
method disclosed examines the energy of the speech and the spectral tilt of 

25 the speech and uses the spectral tilt to distinguish unvoiced speech from 
background noise. 

Variable rate vocoders that vary the encoding rate based entirely on 
the voice activity of the input speech fail to realize the compression 
efficiency of a variable rate coder that varies the encoding rate based on the 

30 complexity or information content that is dynamically varying during 
active speech. By matching the encoding rates to the complexity of the 
input waveform more efficient speech coders can be built. Furthermore, 
systems that seek to dynamically adjust the output data rate of the variable 
rate vocoders should vary the data rates in accordance with characteristics of 

35 the input speech to attain an optimal voice quality for a desired average data 
rate. 
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SUMMARY OF THE INVENTION 

The present invention is a novel and improved method and 
5 apparatus for encoding acHve speech frames at a reduced data rate by 
encoding speech frames at rates between a predetermined maximum rate 
and a predetermined minimum rate. The present invention designates a 
set of active speech operation modes. In the exemplary embodiment of- the 
present invention, there are four active speech operation modes, full rate 
10 speech, half rate speech, quarter rate unvoiced speech and quarter rate 
voiced speech. 

It is an objective of the present invention to provide an optimized 
method for selecting an encoding mode that provides rate efficient coding of 
the input speech. It is a second objective of the present invention to identify 

15 a set of parameters ideally suited for this operatioiuil mode selection and to 
provide a means for generating this set of parameters. Third, it is an 
objective of the present invention to provide identification of two separate 
conditions that allow low rate coding with minimal sacrifice to quality. The 
two conditions are the presence of unvoiced speech and the presence of 

20 temporally masked speech. It is a fourth objective of the present invention 
to provide a method for dynamically adjusting the average output data rate 
of the speech coder with minimal impact on speech quality. 

The present invention, provides a set of rate decision criteria referred 
to as mode measures. A first mode measure is the target matching signal to 

25 noise ratio (TMSNR) from the previous encoding frame, which provides 
information on how well the synthesized speech matches the input speech 
or, in other words, how well the encoding model is performing. A second 
mode measure is the normalized autocorrelation fxmction (NACF), which 
measures periodicity in the speech frame. A third mode measure is the zero 

30 crossings (ZC) parameter which is a computationally inexpensive method 
for measuring high frequency content in an input speech frame. A fourth 
measure is the prediction gain differential (PGD) determines if the LPC 
model is maintaining its prediction efficiency. The fifth measure is the 
energy differential (ED) which compares the energy in the current frame to 

35 an average frame energy. 

The exemplary embodiment of the vocoding algorithm of the present 
invention uses the five mode measures enumerated above to select an 
encoding mode for an active speech frame. The rate determination logic of 
the present invention compares the NACF against a first threshold value 
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and the ZC against a second threshold value to detennine if the speech 
should be coded as unvoiced quarter rate speech. 

If it is determined that the active speech frame contains voiced 
speech, then the vocoder examines the parameter ED to determine if the 
5 speech frame should be coded as quarter rate voiced speech. If it is 
determined that the speech is not to be coded at quarter rate, then the 
vocoder tests if the speech can be coded at half rate. The vocoder tests the 
values of TMSNR, PGD and NACF to determine if the speech frame can be 
coded at half rate. If it is determined that the active speech frame cannot be 

10 coded at quarter or half rates, then the frame is coded at full rate. 

It is further an objective to provide a method for dynamically 
changing threshold values in order to accommodate rate requirements. By 
varying one or more of the mode selection thresholds it is possible to 
increase or decrease the average data transmission rate. So by dynamically 

15 adjusting the threshold values an output rate can be adjusted. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The features, objects, and advantages of the present invention will 
20 become more apparent from the detailed description set forth below when 
taken in conjunction with the drawings in which like reference characters 
identify correspondingly throughout and wherein: 

Figure 1 is a block diagram of the encoding rate determination 
apparatus of the present invention; and 
25 Figure 2 is a flowchart illustrating the encoding rate selection process 

of the rate determination logic. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

30 

In the exemplary embodiment, speech frames of 160 speech samples 
are encoded. In the exemplary embodiment of the presott invention, tiiere 
are four data rates full rate, half rate, quarter rate and eighth rate. Full rate 
corresponds to an output data rate of 14.4.kbps. Half rate corresponds to an 
35 output data rate of 7.2 kbps. Quarter rate corresponds to an output data rate 
of 3.6 kbps. Eighth rate corresponds to an output data rate of 1.8 kbps, and is 
reserved for transmission during periods of silence. 

It should be noted that the present invention relates only to the 
coding of active speech frames, frames that are detected to have speech 
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present in them. The method for detecting the presence of speech is 
detailed in the aforementioned U.S. Patent AppHcation Serial Nos. 
08/004,484 and 07/984,602. 

Referring to Figure 1, mode measurement element 12 determines 
5 values of five parameters used by rate determination logic 14 to select an 
encoding rate for the active speech frame. In the exemplary embodiment, 
mode measurement element 12 determines five parameters which it 
provides to rate determination logic 14. Based on the parameters provided 
by mode measurement element 12, rate determination logic 14 selects an 

10 encoding rate of full rate, half rate or quarter rate. 

Rate determination logic 14 selects one of four encoding modes in 
accordance with the five generated parameters. The four modes of encoding 
include full rate mode, half rate mode, quarter rate unvoiced mode and 
quarter rate voiced mode. Quarter rate voiced mode and quarter rate 

15 xmvoiced mode provide data at the same rate but by means of different 
encoding strategies. Half rate mode is used to code stationary, periodic, well 
modeled speech. Both quarter rate voiced, quarter rate unvoiced, and half 
rate modes take advantage of portions of speech that do not require high 
precision in the coding of the frame. 

20 Quarter rate unvoiced mode is used in the coding of unvoiced 

speech. Quarter rate voiced mode is used in the coding of temporally 
masked speech frames. Most CELP speech coders take advantage of 
simultaneous masking in which speech energy at a given frequency masks 
out noise energy at the same frequency and time making the noise 

25 inaudible. Variable rate speech coders can take advantage of temporal 
masking in which low energy active speech frames are masked by preceding 
high energy speech frames of similar frequency content. Because the 
human ear is integrating energy over time in various frequency bands, low 
energy frames are time averaged with the high energy frames thus lowering 

30 the coding requirements for the low energy frames. Taking advantage of 
this temporal masking auditory phenomena allows the variable rate speech 
coder to reduce the encoding rate during this mode of speech. This 
psychoacoustic phenomenon is detailed in Psychoacoustics by E. Zwicker 
and H. Fasti, pp. 56 -101. 

35 Mode measurement element 12 receives four input signal with 

which it generates the five mode parameters. The first signal that mode 
measurement element 12 receives is S(n) which is the uncoded input 
speech samples. In the exemplary embodiment, the speech samples are 
provided in frames containing 160 samples of speech. The speech frames 
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that are provided to mode measurement element 12 all contain active 
speech. During periods of silence, the active speech rate determination 
system of the present invention is inactive. 

The second signal that mode measurement element 12 receives is the 
5 synthesized speech signal, S(n), which is the decoded speech from the 
encoder's decoder of the variable rate CELP coder. The encoder's decoder 
decodes a frame of encoded speech for the purpose of updating filter 
parameters and memories in analysis by synthesis based CELP coder. The 
design of such decoders are well known in the art and are detailed in the 

10 above mentioned U.S. Patent Application Serial No. 08/004,484. 

The third signal that mode measurement element 12 receives is the 
formant residual signal e(n). The formant residual signal is the speech 
signal S(n) filtered by the linear prediction coding (LPC) filter of the CELP 
coder. The design of LPC filters and the filtering of signals by such filters is 

15 well known in the art and detailed in the above mentioned U.S. Patent 
Application Serial No. 08/004,484. The fourth input to mode measurement 
element 12 is A(2) which are the filter tap values of the perceptual 
weighting filter of the associated CELP coder. The generation of the tap 
values, and filtering operation of a perceptual weighting filter are well 

20 known in the art and are detailed in U.S. Patent Application Serial No. 
08/004,484. 

Target matching signal to noise ratio (SNR) computation element 2 
receives the synthesized speech signal, S(n), the speech samples S(n), and a 
set of perceptual weighting filter tap values A(z). Target matching SNR 
25 computation element 2 provides a parameter, denoted TMSNR, which 
indicates how well the speech model is tracking the input speech. Target 
matching SNR computation element 2 generates TMSNR in accordance 
with equation 1 below: 



30 



TMSNR = 10 log 



159 



159 ^ 
ISw2(n) 

jieQ 



(1) 



X(Sw(n)-Sw(n))2 
.n=0 

where the subscript w denotes that signal has been filtered by a perceptual 
weighting filter. 



Note that this mecisure is computed for the previous frame of speech, while 
35 the NACF, PGD, ED, 2C are computed on the current frame of speech. 
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TMSNR is computed on the previous frame of speech since it is a fimction 
of the selected encoding rate and thus for computational complexity reasons 
it is computed on the previous frame from the frame being encoded. 

The design and implementation of perceptual weighting filters is 
5 well known in the art and is detailed in that aforementioned U.S. Patent 
Application Serial No. 08/004,484. It should be noted that the perceptual 
weighting is preferred to weight the perceptually significant features of the 
speech frame. However, it is envisioned that the measurement could be 
made without perceptually weighting the signals. 
10 Normalized autocorrelation computation element 4 receives the 

formant residual signal, e(n). The function of normalized autocorrelation 
computation element 4 is to provide an indication the periodicity of 
samples in the speech frame. Normalized autocorrelation element 4 
generates a parameter, denoted NACF in accordance with equation 2 below: 



It should be noted that the generation of this parameter requires memory of 
the formant residual signal from the encoding of the previous frame. This 

20 allows testing not only of the periodicity of the current frame, but also tests 
the periodicity of the current frame with the previous frame. 

The reason that in the preferred embodiment the formant residual 
signal, e(n), is used instead of the speech samples, S{n), which could be used, 
in generating NACF is to eliminate the interaction of the formants of the 

25 speech signal. Passing the speech signal though the formant filter serves to 
flatten the speech envelope and thus whitening the resulting signal. It 
should be noted that the values of delay T in the exemplary embodiment 
correspond to pitch frequencies between 66 Hz and 400 Hz for a sampling 
frequency of 8000 samples per second. The pitch frequency for a given delay 

30 value T is calculated by equation 3 below: 



NACF= max 

T€l20,120] 



159 

£e(n)e(n-T) 
n=0 



(2) 




n=0 



f pitch = — / where fs is the sampling frequency. 



(3) 
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It should be noted that the frequency range can be extended or reduced 
simply by selecting a different set of delay values. It should also be noted 
that the present invention is equally applicable to any sampling frequencies. 
Zero crossings coimter 6 receives the speech samples S(n) and counts 
5 the number of times the speech samples change sign. This is a 
computationally inexpensive method of detecting high frequency 
components in the speech signal. This counter can be implemented in 
software by a loop of the form: 

10 cnt=0 (4) 

for n=0,158 (5) 
if (S(n)*S(n+l)<0) cnt++ (6) 

The loop of equations 4-6 multiplies consecutive speech samples and tests if 
15 the product is less than zero indicating that the sign between the two 
consecutive samples differs. This assumes that there is no DC component 
to the speech signal. It well known in the art how to remove DC 
components from signals. 

Prediction gain differential element 8 receives the speech signal S(n) 
20 and the formant residual signal e(n). Prediction gain differential element 8 
generates a parameter denoted PGD, which determines if the LPC model is 
maintaining its prediction efficiency. Prediction gain differential element 8 
generates the prediction gain, Pg, in accordance with equation 7 below: 

I S2(n) 

I e2(n) 
n=0 

The prediction gain of the present frame is then compared against the 
prediction gain of the previous frame in generating the output parameter 
PGD by equation 8 below: 



30 



PGD =10 log 



( Pg(i) 

pg(i-i) 



, where i denotes the frame number. (8) 



PCr/US9S/09780 



13 



10 



In a preferred embodiment, prediction gain differential element 8 does not 
generate the prediction gain values Pg. In the generation of the LPC 
coefficients a byproduct of the Durbin's recursion is the prediction gam Pg so 
no repetition of the computation is necessary. 

Frame energy differential element 10 receives the speech samples s(n) 
of the present frame and computes the energy of the speech signal in the 
present frame in accordance with equation 9 below: 

159 , (OS 
Ei=ls2(n) 

n=0 

The energy of the present frame is compared to an average energy of 
previous frames Eave- In the exemplary embodiment, the average energy, 
Eave, is generated by a leaky integrator of the form: 

15 Eave = o-Eave + (l-a)*Ei, where 0<a<l (10) 

The factor, a, determines the range of frames that are relevant in the 
computation. In the exemplary embodiment, the a is set to 0.8825 which 
provides a time constant of 8 frames. Frame energy differential element 10 
20 then generates the parameter ED in accordance with equation 11 below: 

ED=101og-^. 

bave 

The five parameters, TMSNR, NACF, ZC, PGD, and ED are provided 
25 to rate determination logic 14. Rate determination logic 14 selects an 
encoding rate for the next frame of samples in accordance with the 
parameters and a predetermined set of selection rules. Referring now to 
Figure 2, a flow diagram illustrating the rate selection process of rate 
determination logic element 14 is shown. 
30 The rate determination process begins in block 18. In block 20, the 

output of normalized autocorrelation element 4, NACF, is compared 
against a predetermined threshold value, THRl and the output of zero 
crossings counter is compared against a second predetermined threshold, 
THR2 If NACF is less than THRl and ZC is greater than THR2, then the 
35 flow proceeds to block 22, which encodes the speech as quarter rate 
unvoiced. NACF being less than a predetermined threshold would indicate 
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lack of periodicity in the speech and ZC being greater than a 
predetermined threshold would indicate high frequency component in the 
speech. The combination of these two conditions indicates that the frame 
contains unvoiced speech. In the exemplary embodiment THRl is 0.35 and 
THR2 is 50 zero crossing. If NACF is not less than THRl or ZC is not greater 
than THR2, then the flow proceeds to block 24. 

In block 24, the output of frame energy differential element 10, ED, is 
compared against a third threshold value, THR3. If ED is less than THR3, 
then the current speech frame will be encoded as quarter rate voiced speech 
in block 26. If the energy difference between the current frame is lower than 
the average by a more than a threshold amount, then a condition of 
temporally masked speech is indicated. In the exemplary embodiment, 
THR3 is -14dB. If ED does not exceed THR3 then the flow proceeds to 
block 28. 

15 In block 28, the output of target matching SNR computation 

element 2, TMSNR, is compared to a fourth threshold value, THR4; the 
output of prediction gain differential element 8, PGD, is compared agaiiist a 
fifth threshold value, THR5; and the output of normaUzed autocorrelation 
computation element 4, NACF, is compared against a sixth threshold value 
THR6. If TMSNR exceeds THR4; PGD is less than THR5; and NACF exceeds 
THR6, then the flow proceeds to block 30 and the speech is coded at half rate. 
TMSNR exceeding its threshold will indicate that the model and the speech 
being modeled were matching well in the previous frame. The parameter 
PGD less than its predetermined threshold is indicative that the LPC model 
25 is maintaining its prediction efficiency. The parameter NACF exceeding its 
predetermined threshold indicates that the frame contains periodic speech 
that is periodic with the previous frame of speech. 

In the exemplary embodiment, THR4 is initiaUy set to 10 dB, THR5 is 
set to -5 dB, and THR6 is set to 0.4. In block 28, if TMSNR does not exceed 
30 THR4, or PGD does not exceed THR5, or NACF does not exceed THR6, then 
the flow proceeds to block 32 and the current speech frame will be encoded 
at full rate. 

By dynamicaUy adjusting the threshold values an arbitrary overall 
data rate can be achieved. The overaU active speech average data rate, R, can 
35 be defined for an analysis window W active speech frames as: 

^ Rf •# Rf frames -i- Rh # Rh frames R^^ •# Rq frames 



20 
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where Rf is the data rate for frames encoded at full rate, 
Rh is the data rate for frames encoded at half rate, 
Rq is the data rate for frames encoded at quarter rate, and 
W = #Rf frames + #Rh frames +#Rq frames. 

5 

By multiplying each of the encoding rates by the number of frames encoded 
at that rate and then dividing by the total number of frames in the sample 
an average data rate for the sample of active speech may be computed. It is 
important to have a frame sample size, W, large enough to prevent a long 

10 duration of unvoiced speech, such as drawn out "s" sounds from distorting 
the average rate statistic. In the exemplary embodiment, the frame sample 
size, W, for the calculation of the average rate is 400 frames. 

The average data rate may be decreased by increasing the number of 
frames encoded at full rate to be encoded at half rate and conversely the 

15 average data rate may be increased by increasing the number of frames 
encoded at half rate to be encoded at full rate. In a preferred embodiment 
the threshold that is adjusted to effect this change is THR4. In the 
exemplary embodiment a histogram of the values of TSNR are stored. In 
the exemplary embodiment, the stored TMSNR values are quantized into 

20 values an integral number of decibels from the current value of THR4. By 
maintaining a histogram of this sort it can easily be estimated how many 
frames would have changed in the previous analysis block from being 
encoded at full rate to being encoded at half rate were the THR4 to be 
decreased by an integral number of decibels. Conversely, an estimate of how 

25 many frames encoded at half rate would be encoded at full rate were the 
threshold to be increased by an integral number of decibels. 

The equation for determining the number of frames that should 
change from 1/2 rate frames to full rate frames is determined by the 
equation: 

30 

[target rate -average ratej-W ^^^^ 
^" Rf-Rh 

where A is the number of frames encoded at half rate that should be 
encoded at full rate in order to attain the target rate, and 
35 W = #Rf frames + #Rh frames +#Rq frames. 
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TMSNRneW=TMSNR0LD + (the number of dB from TMSNRqlD 

to achieve A frame differences 
defined in equation 13 above) 

5 Note that the initial value of TMSNR is a function of the target rate desired. 
In an exemplary embodiment of a target rate of 8.7 Kbps, in a system with 
Rf=14.4 kbps, Rf=7.2 kbps, Rq=3.6 kbps, the initial value of TMSNR is 10 dB. 
It should be noted that quantizing the TMSNR values to integral numbers 
for the distance from the threshold THR4 can easily be made finer such as 
10 half or quarter decibels or can be made coarser such as one and a half or two 
decibels. 

It is envisioned that the target rate may either be stored in a memory 
element of rate determination logic element 14, in which case the target rate 
would be a static value in accordance with which the THR4 value would be 
15 dynamically determined. In addition, to this initial target rate, it is 
envisioned that the communication system may transmit a rate command 
signal to the encoding rate selection apparatus based upon current capacity 
conditions of the system. 

The rate command signal could either specify the target rate or could 
simply request an increase or decrease in the average rate. If the system 
were to specify the target rate, that rate would be used in determining the 
value of THR4 in accordance with equations 12 and 13. If the system 
specified only that the user should transmit at a higher or lower 
transmission rate, then rate determirwtion logic element 14 may respond by 
changing the THR4 value by a predetermined increment or may compute 
an incremental change in accordance with a predetermined incremental 
increase or decrease in rate. 

Blocks 22 and 26 indicate a difference in the method of encoding 
speech based upon whether the speech samples represent voiced or 
unvoiced speech. The unvoiced speech is speech in the form of fricaHves 
and consonant sounds such as "f", "s", "sh", "t" and "z". Quarter rate 
voiced speech is temporally masked speech where a low volume speech 
frame follow a relatively high volume speech frame of similar frequency 
content. The human ear caimot hear the fine points of the speech in the a 
low volume frame that follows a high volume frames so bits can be saved 
by encoding this speech at quarter rate. 

In the exemplary embodiment of encoding unvoiced quarter rate 
speech, a speech frame is divided into four subframes. All that is 
transmitted for each of the four subframes is a gain value G and the LPC 
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filter coefficients A(2). In the exemplary embodiment, five bits are 
transmitted to represent the gain in each of each subframe. At a decoder, for 
each subframe, a codebook index is randomly selected. The randomly 
selected codebook vector is multiplied by the transmitted gain value and 
5 passed through the LPC filter, A(z), to generate the synthesized unvoiced 
speech. 

In the encoding of voiced quarter rate speech, a speech frame is 
divided into two subframes and the CELP coder determines a codebook 
index and gain for each of the two subframes. In the exemplary 

10 embodiment, five bits are allocated to indicating a codebook index and 
another five bits are allocated to specifying a corresponding gain value. In 
the exemplary embodiment, the codebook used for quarter rate voiced 
encoding is a subset of the vectors of the codebook used for half and full rate 
encoding. In the exemplary embodiment, seven bits are used to specify a 

15 codebook index in the full and half rate encoding modes. 

In Figure 1, the blocks may be implemented as structural blocks to 
perform the designated functions or the blocks may represent functions 
performed in programming of a digital signal processor (DSP) or an 
application specific integrated circuit ASIC. The description of the 

20 functionality of the present invention would enable one of ordinary skill to 
implement the present invention in a DSP or an ASIC without undue 
experimentation. 

The previous description of the preferred embodiments is provided 
to enable any person skilled in the art to make or use the present invention. 

25 The various modifications to these embodiments will be readily apparent to 
those skilled in the art, and the generic principles defined herein may be 
applied to other embodiments without the use of the inventive faculty. 
Thus, the present invention is not intended to be limited to the 
embodiments shovm herein but is to be accorded the widest scope consistent 

30 with the principles and novel features disclosed herein. 
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CLAIMS 

1. An apparatus for selecting an encoding rate of a predetermined 
2 set of encoding rates for encoding an active frame of speech, comprising: 

mode measurement means for generating a set of parameters 
4 indicative of characteristics of said active frame of speech; and 

rate determination logic means for receiving said set of set of 
6 parameters and selecting ah encoding rate of a predetermined set of 
encoding rates. 

2. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a target matching signal to noise ratio measurement indicative of 

the match between the input speech and the modeled speech. 

3. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech. 

4. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a zero crossings count indicative of the presence of high 

frequency components in said speech frame. 

5. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a prediction gain differential measurement indicative of the 

frame to frame stability of the formants. 

6. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a frame energy differential measurement indicative of changes in 

energy between the energy of the current frame and an average frame 
4 energy, 

7. The apparatus of Claim 1 wherein said predetermined set of 
2 encoding rates comprises full rate, half rate, quarter. 

8. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech and a zero crossings count indicative of the 
4 presence of high frequency components in said speech frame and wherein 
when normalized autocorrelation measurement is below a predetermined 
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6 first threshold and said zero crossings count exceeds a second predetermined 
threshold said rate determination logic means selects an encoding mode of 
8 quarter rate unvoiced encoding. 

9. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a frame energy differential measurement indicative of changes in 

energy between the energy of the current frame and an average frame 
4 energy and wherein when a frame energy differential measurement 

indicative of changes in energy between the energy of the current frame and 
6 an average frame energy exceeds a predetermined threshold, said rate 

determination logic means selects an encoding mode of quarter rate voiced 
8 encoding. 

10. The apparatus of Claim 1 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech, a target matching signal to noise ratio 
4 measurement indicative of match between an encoded frame of speech and 

an input frame of speech, and a prediction gain differential measurement 
6 indicative of the frame to frame stability of a set of formant parameters in 

said encoded speech frame and wherein when normalized autocorrelation 
8 measurement exceeds a predetermined first threshold, said prediction gain 

differential exceeds a second predetermined threshold and said normalized 
10 autocorrelation fimction exceeds a predetermined third threshold said rate 

determination logic means selects an encoding mode of half rate encoding. 

11. In a communication system wherein a remote station 
2 communicates with a central communication center, a method for 

dynamically changing the transmission rate of said remote station 
4 comprising: 

mode measurement means for generating a set of parameters 
6 indicative of characteristics of said active frame of speech; and 

rate determination logic means for receiving said set of parameters 
8 and for receiving a rate command signal and for generating at least one 
threshold value in accordance with said rate command signal, comparing at 
10 least one parameter of said set of parameters with said at least one threshold 
value and selecting an encoding rate in accordance with said comparison. 

12. An apparatus for selecting an encoding rate of a predetermined 
2 set of encoding rates for encoding an active frame of speech, comprising: 
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a mode measurement calculator that gerterates a set of parameters 
4 indicative of characteristics of said active frame of speech; and 

a rate determination logic for receiving said set of set of parameters 
6 and selecting an encoding rate of a predetermined set of encoding rates. 

13. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a target matching signal to noise ratio measurement indicative of 

the match between Ae input speech and the modeled speech. 

14. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech. 

15. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a zero crossings count indicative of the presence of high 

frequency components in said speech frame. 

16. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a prediction gain differential measurement indicative of the 

frame to frame stability of the formants. 

17. The apparatus of Claim 12 wherein said set of parameters 

2 comprises a frame energy differential measurement indicative of changes in 

energy between the energy of the current frame and an average frame 
4 energy. 

18. The apparatus of Claim 12 wherein said predetermined set of 
2 encoding rates comprises full rate, half rate, quarter. 

19. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech and a zero crossings count indicative of the 
4 presence of high frequency components in said speech frame and wherein 

when normalized autocorrelation measurement is below a predetermined 
6 first threshold and said zero crossings count exceeds a second predetermined 

threshold said rate determination logic selects an encoding mode of quarter 
8 rate unvoiced encoding. 
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20. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a frame energy differential measurement indicative of changes in 

energy between the energy of the current frame and an average frame 
4 energy and wherein when a frame energy differential measurement 

indicative of changes in energy between the energy of the current frame and 
6 an average frame energy exceeds a predetermined threshold, said rate 

determination logic selects an encoding mode of quarter rate voiced 
8 encoding. 

21. The apparatus of Claim 12 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech, a target matching signal to noise ratio 
4 measurement indicative of match between an encoded frame of speech and 

an input frame of speech, and a prediction gain differential measurement 
6 indicative of the frame to frame stability of a set of formant parameters in 

said encoded speech frame and wherein when normalized autocorrelation 
8 measurement exceeds a predetermined first threshold, said prediction gain 

differential exceeds a second predetermined threshold and said normalized 
10 autocorrelation function exceeds a predetermined third threshold said rate 

determination logic selects an encoding mode of half rate encoding. 

22. In a communication system wherein a remote station 
2 communicates with a central communication center, a method for 

dynamically changing the transmission rate of said remote station 
4 comprising: 

a mode measurement calculator that generates a set of parameters 
6 indicative of characteristics of said active frame of speech; and 

a rate determination logic that receives said set of parameters and for 
8 receiving a rate command signal and for generating at least one threshold 
value in accordance with said rate command signal, comparing at least one 
10 parameter of said set of parameters with said at least one threshold value 
and selecting an encoding rate in accordance with said comparison. 

23. A method for selecting an encoding rate of a predetermined set 
2 of encoding rates for encoding an active frame of speech, comprising the 

steps of: 

4 generating a set of parameters indicative of characteristics of said 

active frame of speech; and 
6 selecting an encoding rate of a predetermined set of encoding rates. 
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24. The method of Claim 23 wherein said set of parameters 
2 comprises a target matching signal to noise ratio measurement indicative of 

the match between the input speech and the modeled speech. 

25. The method of Claim 23 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech. 

26. The method of Claim 23 wherein said set of parameters 
2 comprises a zero crossings count indicative of the presence of high 

frequency components in said speech frame. 

27. The method of Claim 23 wherein said set of parameters 
2 comprises a prediction gain differential measurement indicative of the 

frame to frame stability of the formants. 

28. The method of Claim 23 wherein said set of parameters 
2 comprises a frame energy differential measurement indicative of changes in 

energy between the energy of the current frame and an average frame 
4 energy. 

29. The method of Claim 23 wherein said predetermined set of 
2 encoding rates comprises full rate, half rate, quarter. 

30. The method of Claim 23 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech and a zero crossings count indicative of the 
4 presence of high frequency components in said speech frame and wherein 

when normalized autocorrelation measurement is below a predetermined 
6 first threshold and said zero crossings count exceeds a second predetermined 

threshold said step of selecting an encoding mode selects quarter rate 
8 unvoiced encoding. 

31. The method of Claim 23 wherein said set of parameters 
2 comprises a frame energy differential measurement indicative of changes in 

energy between the energy of the current frame and an average frame 
4 energy and wherein when a frame energy differential measurement 
indicative of changes in energy between the energy of the current frame and 
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6 an average frame energy exceeds a predetern\ined threshold, said step of 
selecting an encoding mode selects quarter rate voiced encoding. 

32. The method of Claim 23 wherein said set of parameters 
2 comprises a normalized autocorrelation measurement indicative of the 

periodicity in the input speech, a target matching signal to noise ratio 
4 measurement indicative of match between an encoded frame of speech and 

an input frame of speech, and a prediction gain differential measurement 
6 indicative of the frame to frame stability of a set of formant parameters in 

said encoded speech frame and wherein when normalized autocorrelation 
8 measurement exceeds a predetermined first threshold, said prediction gain 

differential exceeds a second predetermined threshold and said normalized 
10 autocorrelation funcHon exceeds a predetermined third threshold said step 

of selecting an encoding mode selects of half rate encoding. 

33. In a communication system wherein a remote station 
2 communicates with a central communication center, a method for 

dynamically changing the transmission rate of said remote station 
4 comprising the steps of: 

generating a set of parameters indicative of characteristics of said 

6 active frame of speech; and 

receiving a rate command signal; 
8 generating at least one threshold value in accordance with said rate 

command signal; 

10 comparing at least one parameter of said set of parameters with said 

at least one threshold value; and 
12 selecting an encoding rate in accordance with said comparison. 
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